An effective framework for supervised dimension reduction
نویسندگان
چکیده
We consider supervised dimension reduction (SDR) for problems with discrete inputs. Existing methods are computationally expensive, and often do not take the local structure of data into consideration when searching for a low-dimensional space. In this paper, we propose a novel framework for SDR with the aims that it can inherit scalability of existing unsupervised methods, and that it can exploit well label information and local structure of data when searching for a new space. The way we encode local information in this framework ensures three effects: preserving inner-class local structure, widening inter-class margin, and reducing possible overlap between classes. These effects are vital for the success in practice. The framework is general and flexible so that it can be easily adapted to various unsupervised topic models. We then adapt our framework to three unsupervised topic models which results in three methods for SDR. Extensive experiments on 10 practical domains demonstrate that our framework can yield scalable and qualitative methods for SDR. In particular, one of the adapted methods can perform consistently better than the state-of-the-art method for SDR while enjoying 30-450 times faster speed.
منابع مشابه
Gradient-based kernel dimension reduction for supervised learning
This paper proposes a novel kernel approach to linear dimension reduction for supervised learning. The purpose of the dimension reduction is to find directions in the input space to explain the output as effectively as possible. The proposed method uses an estimator for the gradient of regression function, based on the covariance operators on reproducing kernel Hilbert spaces. In comparison wit...
متن کاملSemi-supervised learning with Gaussian fields
Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. This paper presents two contributions. First, we show how the GF framework can be used for regression tasks on high-dimensional data. We consider an active learning strategy based on entropy minimization and a maximum likelihood model selection method. Second, we show h...
متن کاملTwo models for Bayesian supervised dimension reduction
We study and develop two Bayesian frameworks for supervised dimension reduction that apply to nonlinear manifolds: Bayesian mixtures of inverse regressions and gradient based methods. Formal probabilistic models with likelihoods and priors are given for both methods and efficient posterior estimates of the effective dimension reduction space and predictive factors can be obtained by a Gibbs sam...
متن کاملGaussian fields for semi-supervised regression and correspondence learning
Gaussian fields (GF) have recently received considerable attention for dimension reduction and semi-supervised classification. In this paper we show how the GF framework can be used for semi-supervised regression on high-dimensional data. We propose an active learning strategy based on entropy minimization and a maximum likelihood model selection method. Furthermore, we show how a recent genera...
متن کاملSpam Filtering Based on Supervised Latent Semantic Features Extraction
Spam text is an universal phenomenon on the “open web”, including large-scale email systems and the growing number of Blogs. Handling this information overload is becoming an increasingly challenging problem, A promising approach is the using of content-based filtering. In this paper, our focus is placed on finding effective dimension reduction method for email Spam filtering, we apply a superv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 139 شماره
صفحات -
تاریخ انتشار 2014